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Abstract 

In this work we present a new idea to develop a method to sep- 
arate stochastic and deterministic information contained in an 
electrocardiogram, ECG, which may provide new sources of in- 
formation with diagnostic purposes. We assume that the ECG 
has information corresponding to many different processes re- 
lated with the cardiac activity as well as contamination from 
different sources related with the measurement procedure and 
the nature of the observed system itself. The method starts 
with the application of an improuved archetypal analysis to 
separate the mentioned stochastic and deterministic informa- 
tion. From the stochastic point of view we analyze Renyi 
entropies, and with respect to the deterministic perspective 
we calculate the autocorrelation function and the correspond- 
ing correlation time. We show that healthy and pathologic 
information may be stochastic and/or deterministic, can be 
identified by different measures and located in different parts 
of the ECG. 

1. Introduction 

An electrocardiogram, ECG, is a time series of measure- 
ments of one observable of a complex system: surface 
electric potentials measured between two poles around 
the heart. The location of the poles depend on the deriva- 
tion in use [1] . We may consider that the complex system 
under study is the cardiac activity constituted by sev- 
eral processes, and also related to many other processes 
including neural, mechanical, hormonal, etc. Therefore, 
the ECG contains mixed information of different sources 
and time scales. The nature of the useful information, 
deterministic or stochastic, that can be extracted from an 
ECG; depends on the characteristics of the corresponding 
underlying process, on the process of measurement, and 
on the capabilities of detecting and differentiating deter- 
ministic from stochastic information. The separation of 
these two kinds of information is difficult, in particular 
for ECGs and other time series of physiological origin. In 
general applications, the a priori knowledge that we have 
of the information and/or the contamination, facilitates 
this separation, which are not well defined in the case 
of physiological signals [2,3]. Traditionally, the informa- 
tion contained in an ECG has clinical relevance when it 
is visually identified by an expert cardiologist [1]. The 
stochastic and deterministic information extracted from 
the ECG may provide new sources of information that 
cannot be identified visually and, therefore, they give 
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complementary information to measure the quality of the 
cardiac activity using the ECG as the unique source of 
information not accesible by other means. 
In this work we present a method to detect and character- 
ize information not visually detectable in the ECG, it can 
be deterministic or stochastic. We consider that the ECG 
contains deterministic information, stochastic information 
and contamination. The first is called deterministic be- 
cause we assume that in principle there is a deterministic 
model that may represent such information. The stochas- 
tic part can also be described with models of the evolution 
of the probability distribution of the possible states of the 
system. The contamination is a consequence of the real 
limitations of the ECG as a time series: finite resolution, 
finite number of data points, measurement sources of noise 
and nonstationarity. 

The method is based on an improved version of an archety- 
pal analysis by constructing a special base of archetypes 
to capture relevant characteristics of the ECG. In general, 
the reconstruction of a signal with archetypes is selfcon- 
sistent because the base of archetypes is constructed from 
the signal itself. What is specific of a particular signal is 
how the archetypal base is constructed. Recognizing the 
strong but not perfect periodicity of the ECG, we have 
prepared a particular archetypal base which permits us to 
overcome the dominant periodicity of the ECG and thus 
identify and measure small variations as deterministic and 
stochastic information. These two informations cannot be 
detected by visual analysis of the ECG as is tradition- 
ally done by a cardiologist. In this work we perform a 
numerical experiment controlling two known ECGs from 
a healthy and a pathological case, instead of a large sta- 
tistical study with many different ECGs, to detect and 
measure stochastic and deterministic information. It is 
important to highlight that the construction of the special 
base of archetypes docs not reduce the detected stochastic 
and deterministic informations to the variability of the 
R-R intervals. The R-R series is used as an internal sys- 
tem of reference for the archetypes as selfconsistency of 
the method. One purpose of the numerically controlled 
experiment is preciselly to show how the changes not rel- 
evant to the visual analysis of an expert, such as small 
dampening of the T wave, may be clearly measured by 
this method. This provides a new source of information 
useful for diagnostic and possibly with certain predictive 
power identifying tendencies of cardiac activity before 
they become clear pathologies. 

In section 2 we present the procedure to filter out the two 
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sources of contamination mentioned above corresponding 
to nonstationarity and noise, and the method to perform 
the separation of the deterministic and stochastic informa- 
tion. This includes the construction of the archetype-base 
as will be presented. In section 3 we present the appli- 
cation of the method to the mentioned ECGs in order 
to characterize the information. Section 4 presents the 
results and section 5 corresponds to the discussion and 
conclusions of this work. 

2. Preprocessing and Archetypal Analysis 

Before we study the ECGs we apply a preprocessing of 
the signal in order to get rid of some contamination but 
taking care of the rest of the information contained in the 
EGG. Given the finite resolution and the finite time of a 
typical EGG, we consider two sources of contamination 
that have to be filtered out before any further analysis is 
performed. We study four EGGs: the first corresponds to 
a healthy patient 25 years old male, H, the second is the 
same healthy EGG but with its T wave smoothed with a 
normal local average, SPl; this is an apparently healthy 
EGG but we call it Simulated pathology 1 because of its 
smoothed T wave. The third EGG also corresponds to a 
simulated pathology obtained from the healthy EGG by 
suppressing the T-wave, SP2; this pathology corresponds 
to a myocardial damage [1]. The last EGG corresponds 
to a real pathologic case, MITP, it is the file 100 of the 
MIT-BIH Arrhythmia Database [4], corresponding to a 
supraventricular cctopy [1,4]. 

In the extreme of low frequencies, or large time scales, we 
observe trends or modulations. These trends are filtered 
out by a time space filtering replacing each data point by 
the average of itself and its 70 neighbors on each side using 
a Gaussian distribution. Since the measuring frequency is 
300Hz, this average covers 0.47 seconds which corresponds 
approximately to one half of the time between successive 
heart beats. However, the Gaussian distribution makes 
the significative number of neighbors used for smoothing 
to be 10 on each side of each data point, approximately a 
time span of 0.07 seconds which does not compromise any 
relevant structure of the EGG. This procedure also filters 
out high frequency noise observed as fast fluctuations. In 
figure 1 we present a few seconds of the four mentioned 
EGGs that will be studied in this work. We do not make 
any further treatment to filter out any other contamina- 
tion considering that the information, deterministic and 
stochastic, may be distributed in most of the frequency 
range of the whole power spectrum of the EGG. 
We first find the R peaks of the EGG. Then we normal- 
ize the R-to-R pseudo-periods with a length equal to the 
greater R-R distance. The traditional archetypal analy- 
sis constructs the set of archetypes from the whole set 
of R-to-R pseudo-periods. We group the R-to-R pseudo- 
periods in five groups ordering their size from the small- 
est to the largest. Then we find the set of archetypes 
for each group. The middle group, the one around the 
mean value, has most of the RR intervals. Wc take the 
first three archetypes obtained from this group. Propor- 



tionally, we take two from the two groups at each side of 
the central group, and one from each of the two exter- 
nal groups. See figure 2. These nine archetypes consti- 
tute the archetype-base to reconstruct the corresponding 
EGG. Sec figure 3. The archetypes of each group are esti- 
mated as follows [5-7]: Gonsider a set of multivariate data 
{x,j,i = 1, where each Xj is an m-dimensional vec- 

tor. By means of Archetypal Analysis we search a set of 
TO-dimensional vectors Zj that characterize the archetypal 
patterns in the data. The patterns Zi, ...,Zp are mixtures 
of the data values {x^}. Specifically, let Zk = J2i Pki^i 
be an archetypal element. Here Pkt > and J^if^ki = 1- 
The {flifc}, k ~ 1,--,P, arc defined as the minimizer of 
^ J2k atfcZfcll- Here atk > and X]fc = 1- Finally, 
we define the archetypal patterns as the mixtures Zi, Zp 
that minimize J2i W^i " J2k atkZkW^. 
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Fig. 1 A few seconds of the ECGs H, SPl, SP2 and MITP. 

3. Archetypal Coefficients Analysis 

The reconstruction of each EGG from the archetype-base, 
generates a time series of values for each archetype and 
each EGG. These values measure the contribution of the 
corresponding archetypes as they are compared with the 
succesive RR intervals. We analyze the time series of each 
coefficient for each EGG to measure the stochastic and de- 
terministic information potentially contained in each case. 



Fig. 2 RR interval histogram. The horizontal scale indicates the 
time intervals of the RR series. The numbers on top of the bars 
indicate the archetype-base used of 9 archetypes. 

We first perform a stochastic analysis of these time sc- 
ries using a symbolic representation. The range of values 
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of any coefficient is between and f. We reduce all the 
possible values to four symbolic values, 0, 1, 2 and 3, cor- 
responding to the ranges of to 0.25, 0.25 to 0.5, 0.5 to 
0.75 and 0.75 to 1 respectively. This approach reduces the 
details but highlights the most representative qualitative 
characteristics. With these four symbols we can construct 
4^ = 64 words of three symbols, and then we obtain the 
probability distribution for each time series representing 
the 9 coefficients obtained from each of the four ECGs 
analyzed in this work. The Renyi entropies given by 

Hi"^ = {l-q)-Hogi J2 Pi-^')') (1) 

give a quantitative measure of the stochastic information 
contained in these distributions [8] . From the determinis- 
tic point of view, we calculate the autocorrelation function 
of these 36 time series of coefficients and estimate the cor- 
relation time for each case. The autocorrelation function 
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Fig. 4 Averages of the archetypal coefficients for the four ECGs. 
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This procedure measures the contribution of the corre- 
(2) spending archetype to the morphology of the RR inter- 
val. Therefore, the mean value of the coefficient of each 
archetype measures the importance of this archetype for 



measures time or causal correlations of the values on a the morphology of the whole EGG. 
time scale t [9,f0]. 
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Fig. 5 Average Renyi entropies and their standard deviations for 
5 = 4 and g = 1/4 for the 4 ECGs as indicated. 



Fig. 3 Some selected archetypes obtained from the healthy EGG 
numbered according to figure 2. 

4. Results 

In figures 4 to 6 we present the most interesting results of 
this work. In figure 4 we plot the average of the archetypal 
coefficients for the four EGGs as indicated in the figure. 
These averages are the mean values of each coefficient. 
A coefficient mean value is obtained from the values of 
the coefficient as the corresponding archetype is compared 
with each RR interval. 



Figure 5 presents the results of two different Renyi en- 
tropies, for q = 1/4 and g = 4. These entropies are 
measures of disorder where the small and large proba- 
bilities dominate respectively. The healthy and T-wave 
smoothed EGG, H and SPl, present larger average en- 
tropies and smaller standard deviations for g = 1/4 than 
the two artificial and real pathologic ECGs, SP2 and 
MITP. These results indicate that the stochastic informa- 
tion is homogeneously distributed in healthy ECGs and 
that this distrubution of stochastic information is lost in 
the pathologic cases. 
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Fig. 6 Autocorrelation times for the 9 coefficients of each archetype 
and the four ECGs as indicated. 

Figure 6 shows the correlation time t, for the nine 
archetype coefficients of the four ECGs, estimated from 
equation (4) when C{t) = 1/3C(0). We observe that cor- 
relation times are larger for healthier ECGs, H and SPl, 
and smaller for pathologic ECGs, SP2 and MITP. We can 
also observe appreciable differences between H and SPl, 
and SP2 and MITP. 

5. Discussion and Conclusions 

With this work we do not pretend to find new physics 
nor develop all the technical details and standarization of 
a new tool for diagnostic using the EGG as the unique 
source of information. As we understand the scope of 
applied physics, we present a method of applying new 
concepts with improved techniques and show its potential 
to help in the solution of actual problems in cardiac diag- 
nostics. For the discussion of results we have to keep in 
mind that SPl actually resembles a healthy case, whereas 
SP2 does not. 

If a given coefficient does not change apreciably along the 
analyzed ECG, it means that the morphology represented 
by this particular archetype is invariant throughout the 
ECG. On the other extreme, if the values change a lot 
and randomly, the corresponding morphology changes 
randomly. An intermediate behavior may indicate some 
order that can be characterized by a stochastic distribu- 
tion and/or deterministic correlations. 
The mean values of i/^/** for H are large and a little larger 
for SPl. The corresponding values of for the SP2 

are small and still a little smaller for MITP. This means 
that the morphologies represented by the archetypes are, 
in average, more disordered for the healthy ECG than the 
corresponding to the two, artificial and real, pathologic 
cases, SP2 and MITP. In addition the standard deviations 
of H^^^ arc smaller for H and SPl, and larger for SP2 and 
MITP. This indicates that the morphologies represented 
by the archetype are homogeneously distributed over the 



nine archetypes of the archctype-base; for the pathologic 
cases the disorder is more localized in some archetypes 
than in others. Therefore, healthier ECGs present more 
disorder, higher entropies, than the pathologic ECGs. 
The correlation time, obtained from the autocorrelation 
function of the coefficients, shows that the local mor- 
phologies of the healthier ECGs are more correlated than 
the pathologic ECGs. This temporal correlation indicates 
some deterministic information in the ECG that is more 
evident in healthier ECGs. As a conclusion, we observe 
that the method presented in this work detect strong evi- 
dences of stochastic and deterministic useful information 
in ECGs. These two kinds of information can diferentiate 
pathologic and healthy ECGs even when the differences 
cannot be detected by visual analysis. We have succeeded 
to some extent, to separate these two informations from 
the ECG, measure them for some trial controlled and 
known cases, and show the distinctic characteristics of 
each one as they are extracted from the different ECGs. 
These results are consisten with the complex nature of the 
cardiac dynamics where stochastic and deterministic as- 
pects are both present in a complex mixture. The quality 
of cardiac dynamics may be characterized from the ECG 
as the unique source of information. 
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